Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-30285 Default cloud logical files to compressed #17770

Conversation

jakesmith
Copy link
Member

@jakesmith jakesmith commented Sep 14, 2023

Configurable via plane attribute 'compressLogicalFiles'

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

@jakesmith jakesmith requested a review from ghalliday September 14, 2023 11:59
@github-actions
Copy link

@jakesmith
Copy link
Member Author

@ghalliday - please review.

As part of this, I added a new exception and restructured the exception headers (and numbers for Thor exception codes).
Should probably be extracted into a separate PR, but please comment.

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakesmith looks good. Main comment/question is whether checking the consistency of the compress options is helpful.

$$.setExpr(createAttribute(__compressed__Atom));
$$.setPosition($1);
| compressionOptions
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

production is better removed - i.e. remove { ... }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove

@@ -11763,6 +11763,7 @@ static void getTokenText(StringBuffer & msg, int token)
case TOK_TRUE: msg.append("TRUE"); break;
case TYPE: msg.append("TYPE"); break;
case TYPEOF: msg.append("TYPEOF"); break;
case UNCOMPRESSED: msg.append("UNCOMPRESSED"); break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also needs to be added to reservedwords.cpp

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add

clusterHandler.setown(new CHThorClusterWriteHandler(lfn, "OUTPUT", agent));
}
clusterHandler->addCluster(cluster);
Owned<IPropertyTree> plane = getStoragePlane(cluster);
bool thisPlaneCompressed = plane ? plane->getPropBool("@compressLogicalFiles") : false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment to indicate this is checking the consistency, rather than determining whether it should be compressed. (I wondered why it wasn't using a default.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add comment

if (1 == clusterIdx)
outputPlaneCompressed = thisPlaneCompressed;
else if (outputPlaneCompressed != thisPlaneCompressed)
throw makeStringExceptionV(ENGINEERR_MIXED_COMPRESSED_WRITE, "Writing to planes with mixed compression is not supported %s", lfn);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a helpful error, or should it just use the compression from the 1st plane? The option on the plane only provides the default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could do, I think this is unlikely to be hit, because we don't support multiple plane output in cloud in other places at the moment (e.g. :

throw MakeStringException(0, "Container mode does not yet support output to multiple clusters while writing file %s)",
)

But, I thought it better to check and throw an early error, rather than continue, and at some later point cause confusion.

{
StringArray clusterNames;
clusterHandler->getClusters(clusterNames);
planeName.set(clusterNames.item(0)); // NB: only bother with 1st, if multilple createClusterWriteHandler validates if same
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: multiple

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix

}
else
getDefaultStoragePlane(planeName);
bool outputCompressionDefault = agent.queryWorkUnit()->getDebugValueBool("compressAllOutputs", isContainerized());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the workunit option useful? Not convinced, but may as well leave it in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it isn't useful in cloud, but it would the current way to change the default in bare-metal since you can't configure planes there.

@jakesmith jakesmith requested a review from ghalliday September 15, 2023 12:08
@jakesmith
Copy link
Member Author

@ghalliday - please see new commit and replies to comments.

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good. I don't think throwing an error is very helpful - they can write files with a different compression by using an ecl attribute - so it isn't something that should be illegal. Maybe a warning - but then again I think it is very unlikely to be hit.

@jakesmith jakesmith requested a review from ghalliday September 22, 2023 13:39
@jakesmith
Copy link
Member Author

Changes look good. I don't think throwing an error is very helpful - they can write files with a different compression by using an ecl attribute - so it isn't something that should be illegal. Maybe a warning - but then again I think it is very unlikely to be hit.

agree, not helpful/right, have removed multi-plane consistency checks and defaulted the default to 1st plane.
@ghalliday - please review.

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakesmith please squash

@ghalliday
Copy link
Member

@jakesmith xmlread.ecl failing may be relevant...

@jakesmith jakesmith force-pushed the HPCC-30285-default-lfn-compress branch from a107825 to f4464d0 Compare September 26, 2023 18:37
@jakesmith
Copy link
Member Author

@jakesmith xmlread.ecl failing may be relevant...

@ghalliday - it was being caused wrong #onwarning numbers in the regression tests, which was due to the movement of the thor error number range (which was previously overlapped with another range in errorlist.h).
Please see last commit.

Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash

Configurable via plane attribute 'compressLogicalFiles'

Signed-off-by: Jake Smith <[email protected]>
@jakesmith jakesmith force-pushed the HPCC-30285-default-lfn-compress branch from f4464d0 to 60885c5 Compare September 29, 2023 08:10
@jakesmith
Copy link
Member Author

@ghalliday - squashed.

@ghalliday
Copy link
Member

I've checked there are no problems with the compiler regression suite.

@ghalliday ghalliday merged commit f579ce6 into hpcc-systems:candidate-9.4.x Oct 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants